WPB-25915 add timeout and duration metric for conversation migration#5244
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds timeout handling and duration observability for background-worker conversation migrations to make stuck per-conversation attempts visible and fail-fast.
Changes:
- Adds optional
timeouttoMigrationOptionsand applies it to per-conversation migration attempts. - Registers and records a Prometheus histogram for conversation migration attempt durations by outcome.
- Documents the new timeout setting and updates package dependencies for
Timeouttime-unit support.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
services/background-worker/src/Wire/PostgresMigrations.hs |
Registers and passes the new conversation migration duration histogram. |
services/background-worker/src/Wire/BackgroundWorker.hs |
Updates existing MigrationOptions constructor calls for the new timeout field. |
libs/wire-subsystems/src/Wire/Migration.hs |
Adds timeout configuration and timeout exception type. |
libs/wire-subsystems/src/Wire/ConversationStore/Migration.hs |
Applies per-conversation timeout logic and records duration metrics. |
libs/types-common/types-common.cabal |
Adds polysemy-time dependency. |
libs/types-common/src/Util/Timeout.hs |
Derives TimeUnit for Timeout. |
libs/types-common/default.nix |
Adds polysemy-time to Nix dependencies. |
docs/src/developer/reference/config-options.md |
Documents the new migration timeout option. |
changelog.d/5-internal/WPB-25915 |
Adds a changelog entry file, but it is currently empty. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Akshay Mankar <akshay@wire.com>
Co-authored-by: Akshay Mankar <akshay@wire.com>
Co-authored-by: Akshay Mankar <akshay@wire.com>
Co-authored-by: Akshay Mankar <akshay@wire.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
4c4c132 to
cd421ac
Compare
|
@battermann the CI was flaky and it lacks an approval from @akshaymankar |
akshaymankar
left a comment
There was a problem hiding this comment.
Looks good. Nits in comments.
| migrateConversationWithLock timeout_ cid = | ||
| withExclusiveMigrationLockAndTimeout timeout_ migDuration [cid] $ migrateConversation cid migCounter |
There was a problem hiding this comment.
This seems to be the only migration where we pulled out the withExclusiveMigrationLockAndTimeout outside the migrate<Thing> function. Why so?
There was a problem hiding this comment.
I checked the code again. It looks correct for all migrations. For the other migrations the code is inlined and for conversations it is extracted into a helper function. That is all. There is no reason.
Let me know if you want this changed, otherwise I will merge. 🙏
There was a problem hiding this comment.
All good, I was just curious what happened. ![]()
Co-authored-by: Akshay Mankar <akshay@wire.com>
https://wearezeta.atlassian.net/browse/WPB-25915
I have tested that it works. However, those test cannot be committed, because they hook into the production migration code to simulate the blocking conversation migration.
Checklist
changelog.d